Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
نویسندگان
چکیده
The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking automatically process facsimiles extract information thereby are multiplying with, as a first essential step, document layout analysis. If identification categorization segments interest in images have seen significant progress years thanks deep learning techniques, many challenges remain among others, use finer-grained segmentation typologies consideration complex, heterogeneous such newspapers. Besides, most approaches consider visual features only, ignoring textual signal. In this context, we introduce multimodal approach for semantic newspapers that combines features. Based on series experiments diachronic Swiss Luxembourgish newspapers, investigate, predictive power their capacity generalize across time sources. Results show consistent improvement models comparison strong baseline, well better robustness high material variance.
منابع مشابه
Combining Textual and Visual Features for Image Retrieval
This paper presents the approaches used by the MIRACLE team to image retrieval at ImageCLEF 2005. Text-based and content-based techniques have been tested, along with combination of both types of methods to improve image retrieval. The text-based experiments defined this year try to use semantic information sources, like thesaurus with semantic data or text structure. On the other hand, content...
متن کاملPage Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features
In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As a first step, the workflow involves scanning and Optical Character Recognition (OCR) of documents. Preservation of document contexts of single page scans is a major requirement in this context. To facilitate workflo...
متن کاملCombining Textual and Visual Clusters for Semantic Image Retrieval and Auto-annotation
In this paper, we propose a novel strategy at an abstract level by combining textual and visual clustering results to retrieve images using semantic keywords and auto-annotate images based on similarity with existing keywords. Our main hypothesis is that images that fall in to the same textcluster can be described with common visual features of those images. In this approach, images are first c...
متن کاملUNAL-NLP: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment
This paper describes our participation in the SemEval-2014 tasks 1, 3 and 10. We used an uniform approach for addressing all the tasks using the soft cardinality for extracting features from text pairs, and machine learning for predicting the gold standards. Our submitted systems ranked among the top systems in all the task and sub-tasks in which we participated. These results confirm the resul...
متن کاملTowards Semantic Enrichment of Newspapers: A Historical Ecology Use Case
Historical ecology research relies on historical accounts of human-animal interactions to study this interaction through space and time. Newspaper archives are a rich source of information, but require careful querying and filtering to collect the relevant information. Traditionally, this is a laborious manual task. In this position paper, we describe our ongoing work on semantically enriching ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Data Mining and Digital Humanities
سال: 2021
ISSN: ['2416-5999']
DOI: https://doi.org/10.46298/jdmdh.6107